HDCat: Effectively Identifying Hot Data in Large-Scale I/O Streams with Enhanced Temporal Locality

نویسندگان

Jiahao Chen

Yuhui Deng

Zhan Huang

چکیده

Hot data is very important for optimizing modern computer systems. For example, the identified hot data can be employed to extend the lifespan of flash memory. However, it is very challenging to effectively identify hot data with low memory consumption and low runtime overhead. This paper proposes a Hot Data Catcher (HDCat) which can effectively identify hot data in large-scale I/O streams by leveraging enhanced temporal locality. HDCat only maintains a hot data queue and a candidate hot data queue to record the data access pattern by tracking limited data set, thus effectively reducing the memory consumption. Furthermore, HDCat adopts a D-bit counter and a recency-bit to leverage both the frequency and recency contained in the data stream. Additionally, HDCat can significantly reduce the conversion between hot data and cold data. Real traces are used to evaluate the proposed approach. Experimental results demonstrate that HDCat significantly outperforms the state-of-the-art Multi-hash algorithm and the two-level LRU algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Mining of Temporal High Utility Itemsets from Data streams

Utility itemsets are considered as the different values of individual items as utilities, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets with support larger than a pre-specified threshold in current time window of data stream. Discovery of temporal high utility itemsets is an important process for mining interesting p...

متن کامل

Data Cube Indexing of Large-Scale Infosec Repositories

Analysts examining large-scale information security repositories for propagating network events are interested in quickly identifying temporal and spatial (IP address and/or port) regions containing interesting phenomena, or correlating events from different time periods. The size of these datasets strains current query capabilities provided by, for example, relational databases. We introduce a...

متن کامل

On Clustering Massive Data Streams: A Summarization Paradigm

In recent years, data streams have become ubiquitous because of the large number of applications which generate huge volumes of data in an automated way. Many existing data mining methods cannot be applied directly on data streams because of the fact that the data needs to be mined in one pass. Furthermore, data streams show a considerable amount of temporal locality because of which a direct a...

متن کامل

Power-Efficient Memory Bus Encoding Using Stride-Based Stream Reconstruction

With the rapid increase in the complexity of chips and the popularity of portable devices, the performance demand is not any more the only important constraint in the embedded system. Instead, energy consumption has become one of the main design issues for contemporary embedded systems, especially for I/O interface due to the high capacitance of bus transition. In this paper, we propose a bus e...

متن کامل

Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization

Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

HDCat: Effectively Identifying Hot Data in Large-Scale I/O Streams with Enhanced Temporal Locality

نویسندگان

چکیده

منابع مشابه

Efficient Mining of Temporal High Utility Itemsets from Data streams

Data Cube Indexing of Large-Scale Infosec Repositories

On Clustering Massive Data Streams: A Summarization Paradigm

Power-Efficient Memory Bus Encoding Using Stride-Based Stream Reconstruction

Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization

عنوان ژورنال:

اشتراک گذاری